Constrained Motif Discovery

نویسندگان

  • Yasser Mohammad
  • Toyoaki Nishida
چکیده

The goal of motif discovery algorithms is to efficiently find unknown recurring patterns in time series. Most available algorithms cannot utilize domain knowledge in any way which results in quadratic or at least sub-quadratic time and space complexity. For large time series datasets for which domain knowledge can be available this is a severe limitation. In this paper we define the Constrained Motif Discovery problem which enables utilization of domain knowledge into the motif discovery process. We also show that most unconstrained motif discovery problems be converted into constrained motif discovery problem using a change point detection algorithm. We provide two algorithms for solving this problem and compare their performance to state-of-the-art motif discovery algorithms on a large set of synthetic time series. The proposed algorithms can provide linear time and constant space complexity. The proposed algorithms provided four to ten folds increase in speed compared to two state of the art motif discovery algorithms without loss of accuracy and provided better noise robustness in high noise levels.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

CPMD: A Matlab Toolbox for Change Point and Constrained Motif Discovery

Change Point Discovery (CPD) and Constrained Motif Discovery (CMD) are two essential problems in data mining with applications in many fields including robotics, economics, neuroscience and other fields. In this paper, we show that these two problems are related and report the development of a MATLAB Toolbox (CPMD) that encapsulates several useful algorithms including new variants to solve thes...

متن کامل

G-SteX: Greedy Stem Extension for Free-Length Constrained Motif Discovery

Most available motif discovery algorithms in real-valued time series find approximately recurring patterns of a known length without any prior information about their locations or shapes. In this paper, a new motif discovery algorithm is proposed that has the advantage of requiring no upper limit on the motif length. The proposed algorithm can discover multiple motifs of multiple lengths at onc...

متن کامل

Constrained multilinear detection for faster functional motif discovery

The GRAPH MOTIF problem asks whether a given multiset of colors appears on a connected subgraph of a vertex-colored graph. The fastest known parameterized algorithm for this problem is based on a reduction to the k-Multilinear Detection (k-MLD) problem: the detection of multilinear terms of total degree k in polynomials presented as circuits. We revisit k-MLD and define k-CMLD, a constrained ve...

متن کامل

Constrained transcription factor spacing is prevalent and important for transcriptional control of mouse blood cells

Combinatorial transcription factor (TF) binding is essential for cell-type-specific gene regulation. However, much remains to be learned about the mechanisms of TF interactions, including to what extent constrained spacing and orientation of interacting TFs are critical for regulatory element activity. To examine the relative prevalence of the 'enhanceosome' versus the 'TF collective' model of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008